-
Notifications
You must be signed in to change notification settings - Fork 1.4k
BFDD : BFD session stays until the last static route over nexhop #19031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ton31337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the commit also (instead of TBD).
| * and the worst case is the refcount will detain us. | ||
| */ | ||
| _ptm_bfd_session_del(bs, BD_NEIGHBOR_DOWN); | ||
| //_ptm_bfd_session_del(bs, BD_NEIGHBOR_DOWN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftovers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here,
This func call ( _ptm_bfd_session_del ) looks like it has been added for a specific scenario rather to address specific bug, not sure how to handle this here.
Basically this call should be deleted, why ?
As we remove the static routes one by one, refcount gets decremented one by one and once it reaches 0, we call the pcn_free where we clean up the pcn related data structures and call _ptm_bfd_session_del where it notifies the peer about the session down and frees up BFD session related data structures.
So according to my opinion, this call is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's drop it instead of commenting.
| struct bfd_session *bs) | ||
| { | ||
| struct ptm_client_notification *pcn; | ||
| /* Lets increment the refcount in the beginning itself regardless of its a new pcn or old*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference here or how was it before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier flow :
bfdd_dest_register
pcn_new
==> here we try to find whether there is already a existing pcn for the particular bsd session, if found, we used to return without updating the ref_count ( refcount updation was later pcn_lookup ) which mess up the refcount logic.
===> if we add multiple static route over the same bfd peer nh, for the subsequent addition, pcn_lookup will give the existing pcn which result in NOT updating the refcount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now refcount incrementation has been added before the pcn_lookup which makes the refcount increment properly
| @@ -0,0 +1,29 @@ | |||
| hostname rt1 | |||
| log file /tmp/topotests/bfd_ospf_topo1.test_bfd_ospf_topo1/rt1/mgmtd.log | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop every log file statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| log file /tmp/topotests/bfd_ospf_topo1.test_bfd_ospf_topo1/rt1/bfdd.log | ||
| no service integrated-vtysh-config | ||
| ! | ||
| password 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the password statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| log file /tmp/topotests/bfd_ospf_topo1.test_bfd_ospf_topo1/rt1/ospf6d.log | ||
| log file /tmp/topotests/bfd_ospf_topo1.test_bfd_ospf_topo1/rt1/staticd.log | ||
| log file /tmp/topotests/bfd_ospf_topo1.test_bfd_ospf_topo1/rt1/bfdd.log | ||
| no service integrated-vtysh-config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| ! | ||
| password 1 | ||
| ! | ||
| ip route 1.1.1.2/32 10.0.1.2 bfd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Cloudflare IPs, please. Use private ranges or documentation ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| exit | ||
| ! | ||
| interface lo | ||
| ip address 2.2.2.2/32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use private or documentation ranges.
| # SPDX-License-Identifier: ISC | ||
|
|
||
| # | ||
| # test_bfd_static_vrf.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name
| # test_bfd_static_vrf.py | ||
| # Part of NetDEF Topology Tests | ||
| # | ||
| # Copyright 2025 6WIND S.A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you behind 6wind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the copyright
c2bc60d to
88e9541
Compare
…utes over same nh Adding the topotest which verifies whether BFD session stays up as long as there is one static route pointing to nexthop bfd peer Signed-off-by: Varun Hegde <[email protected]>
This Fix makes the BFD session to stay up until the last static route going over the gateway ip which is tracked by bfd. Signed-off-by: Varun Hegde <[email protected]>
88e9541 to
93ed0ce
Compare
|
======> cmd : show bfd peers json [ ================> from the router [ ===============> expected o/p from the testcase file one of the topotest failing consistantly ( bfd_topo3 ) where expected does not match the received o/p from the router. When I checked, its failing because its not able to find "local" and/or "interface" fields ( which are in expected files but not in router gen o/p ) I dont see how my changes affected this failure. Any idea @ton31337 ? Thanks, |
| ! | ||
|
|
||
| interface lo | ||
| ip address 1.1.1.1/32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Cloudflare, please, use private or documentation ranges.
| exit | ||
| ! | ||
| interface lo | ||
| ip address 2.2.2.2/32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, use private or documentation ranges.
| # | ||
| # test_bfd_static_topo1.py | ||
| # | ||
| # Copyright (c) 2024 by Varun Hegde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My calendar now shows it's 2025.
| * and the worst case is the refcount will detain us. | ||
| */ | ||
| _ptm_bfd_session_del(bs, BD_NEIGHBOR_DOWN); | ||
| //_ptm_bfd_session_del(bs, BD_NEIGHBOR_DOWN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's drop it instead of commenting.
|
Hello, Could you let me know when this fix can be merged? Thanks. |
|
@varuntumbe are you still working on this? |
Hi @ton31337 , This solution is fundamentally wrong ( as I found out when one of the bfd topo test failed ). This needs a rework. I tried reworking this couple of months back but was not able to code it up to a proper solution. I am not working on this now. Apologies. thanks |
What is the problem ?
When we have more then one static route over the same nexthop with BFD, session goes down and gets cleared if we remove one of the static route.
What is the Root Cause ?
But for the subsequent addition of static route over the same nh, means pcn_lookup would give the same existing pcn skipping the increment of refcount.
so essentially, refcount remains at value 1, even if you add any number of static route over the nh.
What is the fix ?
Fix has 2 parts
Incrementing the refcount properly ( early before pcn_lookup) for every static route addition. This makes sures that refcount gets updated properly for every addition of static route
Now when we start removing the static routes one by one, we decrement the refcount ( in bfdd_dest_deregister ) and will call the pcn_free only when refcount becomes 0 ( essentially deleting the session only when there are no static routes points to nh )
Closes #19014